Goto

Collaborating Authors

 user input




Optimizing Prompts for Text-to-Image Generation

Neural Information Processing Systems

Well-designed prompts can guide text-to-image models to generate amazing images. However, the performant prompts are often model-specific and misaligned with user input. Instead of laborious human engineering, we propose prompt adaptation, a general framework that automatically adapts original user input to model-preferred prompts. Specifically, we first perform supervised fine-tuning with a pretrained language model on a small collection of manually engineered prompts. Then we use reinforcement learning to explore better prompts. We define a reward function that encourages the policy to generate more aesthetically pleasing images while preserving the original user intentions. Experimental results on Stable Diffusion show that our method outperforms manual prompt engineering in terms of both automatic metrics and human preference ratings. Moreover, reinforcement learning further boosts performance, especially on out-of-domain prompts.


GaussianCut: Interactive segmentation via graph cut for 3D Gaussian Splatting

Neural Information Processing Systems

We introduce GaussianCut, a new method for interactive multiview segmentation of scenes represented as 3D Gaussians. Our approach allows for selecting the objects to be segmented by interacting with a single view. It accepts intuitive user input, such as point clicks, coarse scribbles, or text. Using 3D Gaussian Splatting (3DGS) as the underlying scene representation simplifies the extraction of objects of interest which are considered to be a subset of the scene's Gaussians. Our key idea is to represent the scene as a graph and use the graph-cut algorithm to minimize an energy function to effectively partition the Gaussians into foreground and background. To achieve this, we construct a graph based on scene Gaussians and devise a segmentation-aligned energy function on the graph to combine user inputs with scene properties. To obtain an initial coarse segmentation, we leverage 2D image/video segmentation models and further refine these coarse estimates using our graph construction. Our empirical evaluations show the adaptability of GaussianCut across a diverse set of scenes.


Capability-Driven Skill Generation with LLMs: A RAG-Based Approach for Reusing Existing Libraries and Interfaces

da Silva, Luis Miguel Vieira, Köcher, Aljosha, König, Nicolas, Gehlhoff, Felix, Fay, Alexander

arXiv.org Artificial Intelligence

Modern automation systems increasingly rely on modular architectures, with capabilities and skills as one solution approach. Capabilities define the functions of resources in a machine-readable form and skills provide the concrete implementations that realize those capabilities. However, the development of a skill implementation conforming to a corresponding capability remains a time-consuming and challenging task. In this paper, we present a method that treats capabilities as contracts for skill implementations and leverages large language models to generate executable code based on natural language user input. A key feature of our approach is the integration of existing software libraries and interface technologies, enabling the generation of skill implementations across different target languages. We introduce a framework that allows users to incorporate their own libraries and resource interfaces into the code generation process through a retrieval-augmented generation architecture. The proposed method is evaluated using an autonomous mobile robot controlled via Python and ROS 2, demonstrating the feasibility and flexibility of the approach.


Between Help and Harm: An Evaluation of Mental Health Crisis Handling by LLMs

Arnaiz-Rodriguez, Adrian, Baidal, Miguel, Derner, Erik, Annable, Jenn Layton, Ball, Mark, Ince, Mark, Vallejos, Elvira Perez, Oliver, Nuria

arXiv.org Artificial Intelligence

Large language model-powered chatbots have transformed how people seek information, especially in high-stakes contexts like mental health. Despite their support capabilities, safe detection and response to crises such as suicidal ideation and self-harm are still unclear, hindered by the lack of unified crisis taxonomies and clinical evaluation standards. We address this by creating: (1) a taxonomy of six crisis categories; (2) a dataset of over 2,000 inputs from 12 mental health datasets, classified into these categories; and (3) a clinical response assessment protocol. We also use LLMs to identify crisis inputs and audit five models for response safety and appropriateness. First, we built a clinical-informed crisis taxonomy and evaluation protocol. Next, we curated 2,252 relevant examples from over 239,000 user inputs, then tested three LLMs for automatic classification. In addition, we evaluated five models for the appropriateness of their responses to a user's crisis, graded on a 5-point Likert scale from harmful (1) to appropriate (5). While some models respond reliably to explicit crises, risks still exist. Many outputs, especially in self-harm and suicidal categories, are inappropriate or unsafe. Different models perform variably; some, like gpt-5-nano and deepseek-v3.2-exp, have low harm rates, but others, such as gpt-4o-mini and grok-4-fast, generate more unsafe responses. All models struggle with indirect signals, default replies, and context misalignment. These results highlight the urgent need for better safeguards, crisis detection, and context-aware responses in LLMs. They also show that alignment and safety practices, beyond scale, are crucial for reliable crisis support. Our taxonomy, datasets, and evaluation methods support ongoing AI mental health research, aiming to reduce harm and protect vulnerable users.


QBR: A Question-Bank-Based Approach to Fine-Grained Legal Knowledge Retrieval for the General Public

Yuan, Mingruo, Kao, Ben, Wu, Tien-Hsuan

arXiv.org Artificial Intelligence

Retrieval of legal knowledge by the general public is a challenging problem due to the technicality of the professional knowledge and the lack of fundamental understanding by laypersons on the subject. Traditional information retrieval techniques assume that users are capable of formulating succinct and precise queries for effective document retrieval. In practice, however, the wide gap between the highly technical contents and untrained users makes legal knowledge retrieval very difficult. We propose a methodology, called QBR, which employs a Questions Bank (QB) as an effective medium for bridging the knowledge gap. We show how the QB is used to derive training samples to enhance the embedding of knowledge units within documents, which leads to effective fine-grained knowledge retrieval. We discuss and evaluate through experiments various advantages of QBR over traditional methods. These include more accurate, efficient, and explainable document retrieval, better comprehension of retrieval results, and highly effective fine-grained knowledge retrieval. We also present some case studies and show that QBR achieves social impact by assisting citizens to resolve everyday legal concerns.



LLM Reinforcement in Context

Rivasseau, Thomas

arXiv.org Artificial Intelligence

LLM alignment techniques currently struggle with enforcing desired characteristics and harmlessness of outputs over long conversational contexts and chains-of-thought. In this paper we present the scaling problem, a mathematical formulation of this difficulty, and propose interruptions as a means to achieve LLM alignment in scaling contexts. We call this reinforcement in context. Paper structure is as follows: section 1 is this introduction and section 2 presents the scaling problem. In section 3 we describe interruptions as a means to solve the alignment scaling problem. In section 4 we discuss consequences and limitations and in section 5 we highlight avenues for future research.


Live Music Models

Lyria Team, null, Caillon, Antoine, McWilliams, Brian, Tarakajian, Cassie, Simon, Ian, Manco, Ilaria, Engel, Jesse, Constant, Noah, Li, Yunpeng, Denk, Timo I., Lalama, Alberto, Agostinelli, Andrea, Huang, Cheng-Zhi Anna, Manilow, Ethan, Brower, George, Erdogan, Hakan, Lei, Heidi, Rolnick, Itai, Grishchenko, Ivan, Orsini, Manu, Kastelic, Matej, Zuluaga, Mauricio, Verzetti, Mauro, Dooley, Michael, Skopek, Ondrej, Ferrer, Rafael, Petridis, Savvas, Borsos, Zalán, Oord, Äaron van den, Eck, Douglas, Collins, Eli, Baldridge, Jason, Hume, Tom, Donahue, Chris, Han, Kehang, Roberts, Adam

arXiv.org Artificial Intelligence

We introduce a new class of generative models for music called live music models that produce a continuous stream of music in real-time with synchronized user control. We release Magenta RealTime, an open-weights live music model that can be steered using text or audio prompts to control acoustic style. On automatic metrics of music quality, Magenta RealTime outperforms other open-weights music generation models, despite using fewer parameters and offering first-of-its-kind live generation capabilities. We also release Lyria RealTime, an API-based model with extended controls, offering access to our most powerful model with wide prompt coverage. These models demonstrate a new paradigm for AI-assisted music creation that emphasizes human-in-the-loop interaction for live music performance.